Skip to content

{2025.06}[foss/2025b] SPH-EXA-0.96.1 CUDA-12.9.1#1453

Open
pescobar wants to merge 3 commits intoEESSI:mainfrom
pescobar:sph-exa-2025.6
Open

{2025.06}[foss/2025b] SPH-EXA-0.96.1 CUDA-12.9.1#1453
pescobar wants to merge 3 commits intoEESSI:mainfrom
pescobar:sph-exa-2025.6

Conversation

@pescobar
Copy link
Contributor

No description provided.

@ocaisa
Copy link
Member

ocaisa commented Mar 20, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Mar 20, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.03/pr_1453/20996749

date job status comment
Mar 20 12:25:56 UTC 2026 submitted job id 20996749 will be eligible to start in about 20 seconds
Mar 20 12:26:07 UTC 2026 received job awaits launch by Slurm scheduler
Mar 20 12:26:21 UTC 2026 running job 20996749 is running
Mar 21 12:26:40 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job20996749.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Mar 21 12:26:40 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job20996749.test does not exist in job directory, or parsing it failed.

@ocaisa
Copy link
Member

ocaisa commented Mar 20, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Mar 20, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.03/pr_1453/20996760

date job status comment
Mar 20 12:27:02 UTC 2026 submitted job id 20996760 will be eligible to start in about 20 seconds
Mar 20 12:27:11 UTC 2026 received job awaits launch by Slurm scheduler
Mar 20 12:27:27 UTC 2026 running job 20996760 is running
Mar 20 12:29:29 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-20996760.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17740097090.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Mar 20 12:29:29 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (3/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (4/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/4 test case(s) from 4 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-20996760.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

Comment on lines +2 to +3
- UCX-CUDA-1.19.0-GCCcore-14.3.0-CUDA-12.9.1.eb:
- UCC-CUDA-1.4.4-GCCcore-14.3.0-CUDA-12.9.1.eb:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- UCX-CUDA-1.19.0-GCCcore-14.3.0-CUDA-12.9.1.eb:
- UCC-CUDA-1.4.4-GCCcore-14.3.0-CUDA-12.9.1.eb:

No need to include these, they will get built automatically as dependencies

@ocaisa
Copy link
Member

ocaisa commented Mar 20, 2026

5 out of 60 required modules missing:

* GDRCopy/2.5-GCCcore-14.3.0 (GDRCopy-2.5-GCCcore-14.3.0.eb)
* UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1 (UCX-CUDA-1.19.0-GCCcore-14.3.0-CUDA-12.9.1.eb)
* NCCL/2.27.7-GCCcore-14.3.0-CUDA-12.9.1 (NCCL-2.27.7-GCCcore-14.3.0-CUDA-12.9.1.eb)
* UCC-CUDA/1.4.4-GCCcore-14.3.0-CUDA-12.9.1 (UCC-CUDA-1.4.4-GCCcore-14.3.0-CUDA-12.9.1.eb)
* SPH-EXA/0.96.1-foss-2025b-CUDA-12.9.1 (SPH-EXA-0.96.1-foss-2025b-CUDA-12.9.1.eb)

It's failing because when we build for a GPU, we only allow GPU dependendencies to be built automatically. This requires GDRCopy-2.5-GCCcore-14.3.0.eb which does not need a GPU, a separate CPU PR will have to be created for that first.

@pescobar
Copy link
Contributor Author

pescobar commented Mar 21, 2026

I have removed UCX-CUDA and UCC-CUDA from the easystack

I have also created PR #1455 with the required deps

@ocaisa
Copy link
Member

ocaisa commented Mar 21, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Mar 21, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: intel-icelake and accelerator nvidia/cc80
Building for: x86_64/intel/icelake and accelerator nvidia/cc80
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.03/pr_1453/21031581

date job status comment
Mar 21 18:57:43 UTC 2026 submitted job id 21031581 will be eligible to start in about 20 seconds
Mar 21 18:57:53 UTC 2026 received job awaits launch by Slurm scheduler
Mar 21 18:58:06 UTC 2026 running job 21031581 is running
Mar 21 19:06:03 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-21031581.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-intel-icelake-accel-nvidia-cc80-17741199060.tar.zstsize: 0 MiB (1036743 bytes)
entries: 52
modules under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/modules/all
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1.lua
software under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/software
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1
reprod directories under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80/reprod
UCX-CUDA/1.19.0-GCCcore-14.3.0-CUDA-12.9.1/20260321_190213UTC
other under 2025.06/software/linux/x86_64/intel/icelake/accel/nvidia/cc80
no other files in tarball
Mar 21 19:06:03 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (3/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (4/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/4 test case(s) from 4 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-21031581.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@pescobar
Copy link
Contributor Author

a new release has been published https://github.com/sphexa-org/sphexa/releases/tag/v0.96.2

I will update this PR with the latest version

@pescobar
Copy link
Contributor Author

I have created a PR in the easyconfigs repo with the latest release easybuilders/easybuild-easyconfigs#25606

@trz42 trz42 added the 2025.06-software.eessi.io 2025.06 version of software.eessi.io label Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants